A Sequence Labelling Approach to Quote Attribution
نویسندگان
چکیده
Quote extraction and attribution is the task of automatically extracting quotes from text and attributing each quote to its correct speaker. The present state-of-the-art system uses gold standard information from previous decisions in its features, which, when removed, results in a large drop in performance. We treat the problem as a sequence labelling task, which allows us to incorporate sequence features without using gold standard information. We present results on two new corpora and an augmented version of a third, achieving a new state-of-the-art for systems using only realistic features.
منابع مشابه
Examining the Impact of Coreference Resolution on Quote Attribution
Quote attribution is the task of identifying the speaker of each quote within a document. While recent research has established large-scale corpora for this task, these corpora are not yet consistent in the way they handle candidate speakers, and many of the reported results rely on gold standard annotations of both entities and coreference chains. In this work we evaluate three quote attributi...
متن کاملA Two-stage Sieve Approach for Quote Attribution
We present a deterministic sieve-based system for attributing quotations in literary text and a new dataset: QuoteLi31. Quote attribution, determining who said what in a given text, is important for tasks like creating dialogue systems, and in newer areas like computational literary studies, where it creates opportunities to analyze novels at scale rather than only a few at a time. We release Q...
متن کاملK-best Viterbi Semi-supervized Active Learning in Sequence Labelling
In application domains where there exists a large amount of unlabelled data but obtaining labels is expensive, active learning is a useful way to select which data should be labelled. In addition to its traditional successful use in classification and regression tasks, active learning has been also applied to sequence labelling. According to the standard active learning approach, sequences for ...
متن کاملConstraint Satisfaction Inference: Non-probabilistic Global Inference for Sequence Labelling
We present a new method for performing sequence labelling based on the idea of using a machine-learning classifier to generate several possible output sequences, and then applying an inference procedure to select the best sequence among those. Most sequence labelling methods following a similar approach require the base classifier to make probabilistic predictions. In contrast, our method can b...
متن کاملWhose Line Is It? – Quote Attribution through Recurrent Neural Networks
This paper presents a recurrent neural network framework for the problem of attributing spoken lines to characters in a screenplay or novel. We study these quotes as a sequence in the absence of additional context, e.g. descriptions of scenes or actions, from the text surrounding them. Instead, attributions may only be made on the basis of learned expectations for how each character speaks, as ...
متن کامل